Enzyme promiscuity prediction using hierarchy-informed multi-label classification

نویسندگان

چکیده

Abstract Motivation As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present evaluate several machine-learning models to predict which 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, likely interact with a given query molecule. Our data consists enzyme-substrate interactions from BRENDA database. Some attributed natural selection involve enzyme’s substrates. The majority however non-natural substrates, thus reflecting promiscuous enzymatic activities. Results frame this ‘enzyme promiscuity prediction’ problem multi-label classification task. maximally utilize inhibitor unlabeled train prediction that can take advantage known hierarchical relationships between classes. report neural network, EPP-HMCNF, best model for solving problem, outperforming k-nearest neighbors similarity-based other models. show information during training consistently improves predictive power, particularly EPP-HMCNF. also all perform worse under realistic split when compared random split, evaluating performance on substrates Availability implementation provide Python code EPP-HMCNF in repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP. Supplementary available Bioinformatics online.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Air pollution prediction via multi-label classification

A Bayesian network classifier can be used to estimate the probability of an air pollutant overcoming a certain threshold. Yet multiple predictions are typically required regarding variables which are stochastically dependent, such as ozone measured in multiple stations or assessed according to by different indicators. The common practice (independent approach) is to devise an independent classi...

متن کامل

Molecular signatures-based prediction of enzyme promiscuity

MOTIVATION Enzyme promiscuity, a property with practical applications in biotechnology and synthetic biology, has been related to the evolvability of enzymes. At the molecular level, several structural mechanisms have been linked to enzyme promiscuity in enzyme families. However, it is at present unclear to what extent these observations can be generalized. Here, we introduce for the first time...

متن کامل

Multi-Label Informed Feature Selection

Multi-label learning has been extensively studied in the area of bioinformatics, information retrieval, multimedia annotation, etc. In multi-label learning, each instance is associated with multiple interdependent class labels, the label information can be noisy and incomplete. In addition, multi-labeled data often has high-dimensional noisy, irrelevant and redundant features. As an effective d...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bioinformatics

سال: 2021

ISSN: ['1367-4811', '1367-4803']

DOI: https://doi.org/10.1093/bioinformatics/btab054